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Abstract. 

P(|Xij| ^ t) behave like e""* for some a > and a £ (0,2). We establish a large deviations 
principle for the empirical spectral measure of X/^/n with speed n^^"^^ with a good rate 
function J{fi) that is finite only if fi is of the form /i = /isc ffl i' for some probability measure 
1/ on R, where ffl denotes the free convolution and fisc is Wigner's semicircle law. We obtain 
explicit expressions for J(/^sc ffl i^) in terms of the Q-th moment of v. The proof is based on the 
analysis of large deviations for the empirical distribution of very sparse random rooted networks. 



1. Introduction 

Let TiniC) denote the set of n x n hermitian matrices. The empirical spectral measure of a 
matrix A £ 'Hn(C) is the probabihty measure on M defined by 



n 

11 



k=l 



where Ai(^) ^ . . . ^ A„(^) denote the eigenvalues of A counting multiphcity. Below, we 
consider the empirical spectral measure of a Wigner random matrix X described as follows. 
Let (Xjj)i^j<j be i.i.d. complex random variables with variance E|Xi2 — = 1, and let 

{Xii)i^i be i.i.d. real random variables. Extend this array by setting Xij = Xji for 1 ^ j < i, 
and consider the sequence of n x n Hermitian random matrices 

X{n) = {Xij)i<^ijs^n- (1) 
For ease of notation, we often drop the argument n and simply write X for X{n). 

The space V(M) of probability measures on M is endowed with the topology of weak conver- 
gence: a sequence of probability measures converges weakly to n if for any bounded 
continuous function / : M 1— >• M, J" fd^n J fdfi as n goes to infinity. We denote this conver- 
gence by fin ~^ fJ-- Wigner's celebrated theorem asserts that almost surely, 

Mx/v^ ~^ IJ-sc, (2) 

where /igc is the semicircle law, i.e. the probability measure with density — on [—2,2]; 

seee.g. [lllSllIl]. 

We consider large deviations, i.e. events of the form fJ-x/^ ^ B where i? is a measurable 
set in V{M) whose closure does not contain the limiting law figc- Clearly, ^ implies that 
G i?) — )• 0, n — )• 00. It follows from known concentration estimates that if the entries 
Xij are bounded, or if they satisfy a logarithmic Sobolev inequality, then F(A*x/Vn ^ ^) decays 
to as fast as e"'^"'^ for some constant c > 0; see Guionnet and Zeitouni pTj, or [3]. Further, if 
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the Xij have a gaussian law such that X belongs to the gaussian unitary ensemble GUE or the 
gaussian orthogonal ensemble GOE, then a full large deviations principle for l^xj^ with speed 
has been established by Ben Arous and Guionnet in [5]. However, apart from the GUE and 
GOE cases, we are not aware of any case for which the large deviations principle for /^x/v^ 
been obtained. 

In this paper we prove a large deviations principle under the assumption that Xij has tail 
probabilities P(|Xjj| ^ t) of order e~"*" for some a > 0, and a G (0,2). Before stating our 
assumptions and results in detail, let us make some preliminary remarks. 

By considering events of the form \Xij \ ~ -y/n, (i, j) E /, for suitable sets / of pairs of indices, 
it is not hard to see that a nontrivial large deviation can be achieved with probability at least 
as large as g"^"^^"^^, for some c > 0. For instance, the case when / is the diagonal 
2 = 1, . . . , n, can be used to produce a global shift of the spectral measure /i^c at a cost 

-logP(|X,,|~ V^, i = l,...,n) =0(ni+"/2)^ 

on the exponential scale. Similarly, one expects to be able to produce more general deformations 
of 

\isc at a cost of order n^"'"'^/^. It turns out that this picture is correct, provided the deformations 
of /isc are of the form \x = Hsc E for some v G 'P(M), where ffl denotes the free convolution. 
Roughly speaking, the idea is that the entries of X that are visible on a scale ^/n form a very 
sparse weighted random graph or random network G„ that is asymptotically independent from 
the rest of the matrix, and a large deviations principle for fJ-x/y^ can be deduced from a large 
deviations principle for the law of the random network Gn- This approach also allows us to 
obtain explicit expressions for the rate function. 

The strategy of proof developed in the present work for Wigner matrices could certainly be 
generalized to other models such as random covariance matrices or random band matrices with 
the same type of tail assumptions on the entries. We also believe that our strategy might extend 
to other tail assumptions such as power laws P(|Xjj| ^ t) ~ with exponent a > 2. The 

analysis of large deviations for the associated random network is however more delicate in this 
case. 

Main result. We recall that a sequence of random variables {Zn)n^i with values in a topological 
space X with Borel cr-field B, satisfies the large deviations principle (LDP) with rate function J 
and speed v, ii J : X [0, oo] is a lower semi-continuous function, u : N i— )■ [0, oo) is a function 
which increases to infinity, and for every B £ B: 

- inf J{x) ^ hm inf -j— log P (Z„ G 5) ^ hm sup -j— log P (Z„ G 5) ^ - inf J{x) , (3) 

x£B° n^oo v{n) n^oo v(n) xeB 

where B° denotes the interior of B and B denotes the closure of B. We recall that the lower 
semi- continuity of J means that its level sets {x £ X : J{x) ^ t} are closed for all t ^ 0. When 
the level sets are compact the rate function J is said to be good. 

We now introduce our statistical assumption. Let a, a G (0, oo). We say that a complex 
random variable Y belongs to the class Sa{a), and write Y G Sa{a), if 

lim -t-"logP(|y| ^ t) = a, (4) 

and if and |y| are independent for large values of \Y\, i.e. there exists to > and a 

probability G 'P(S^) on the unit circle S"*^ such that for all t ^ to, all measurable sets [/ C S^, 
one has 

P(y/|y| G U and \Y\ ^ t) = ??(C/)P(|y| ^ t) . (5) 
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For instance, if Y is Weibull, i.e. y is a nonnegative random variable with distribution function 
F(t) = 1 — e~'** , with a > 0, and a > 0, then Y £ Sa{a), with ■& = 6i, the unit mass at the 
point 1. Clearly, if y G Sa{a) is real valued, then the associated measure -& must have support 
in {—1, 1}. Moreover, for all a > we write Y G 5q(oo) whenever ([U holds with a = oo. Thus, 
with the above notation one has that if Y is subgaussian then Y G 5^(oo) for all /3 G (0, 2) and 
if y G Sa{a) for some a, a > 0, then Y G (oo) for all /3 G (0, q). 

Throughout the paper, we assume that the array Xij is given as above, i.e. Xij, i < j, are 
i.i.d. copies of a complex random variable X12 with unit variance, and Xa are i.i.d. copies of a 
real random variable Xn. Moreover, the following main assumption will always be understood 
without explicit mention. 

Assumption 1. There exist a G (0, 2) and a,b £ (0, 00] such that X12 G Said) and Xn G Sa{b). 

The main result can be formulated as follows. 
Theorem 1.1. The measures fJ-x/y^ satisfies the LDP with speed n^+'^f'^ and good rate function 

J, . ^ ( ^{i^) if n = Hsc^i^ for some v G P(M) . x 

I 00 otherwise^ 

where $ : 'P(M) 1— )• [0, 00] is a good rate function. 

The proof of Theorem 11.11 consists of two main parts. The first part, the "random matrix 
theory part" of the work, is discussed in Section [2j Here, we show that at speed n^+°'/'^ the large 
deviations are governed by the sparse n x n random matrix C = C{n) defined by 

if e(n) ^ ^ ^ e(n)-i 



otherwise 



where e(n) is a cutoff sequence that for convenience will be set equal to 1/logn. In particular, 
we show that as far as the LDP with speed n^"*""/^ is concerned, l^xj^ behaves as [x^c H 
where [ic is the spectral measure of the matrix C; see Proposition 12.11 below . As a consequence, 
the LDP for ^J-x/^/n ^^"^ be obtained by contraction if one has the LDP for with speed n^+"-/'^ 
and rate function <I>. 

The second part, the "random graph theory part" of the work, is presented in Section [3l Here, 
we prove the above mentioned LDP for the spectral measures ^c- This requires the analysis of 
large deviations for sparse random networks, and some use of the theory of local convergence 
for random networks that was recently developed by Benjamini and Schramm [6], Aldous and 
Steele [2J, and Aldous and Lyons [T]. Let us briefly sketch the main ideas. Let Gn be the sparse 
random network naturally associated to the n x n matrix C, and let p„ denote the law of the 
equivalence class (under rooted isomorphisms) of the connected component of G„ at the root, 
when the root is chosen uniformly at random. The law pn is regarded as an element of the space 
ViG*) of probability measures on G^:, where is the space of equivalence classes of connected 
rooted networks. We introduce a suitable weak topology on V{G^), and prove that the measures 
Pn satisfy a LDP with speed n^+°'/'^ and a good rate function I{p). The latter is finite only if p 
belongs to the so called sofic measures, i.e. if p is a limit of finite networks, and if the support 
of p satisfies some natural constraints. We call VsiG*) the set of such probability measures. We 
find that for p G Vs[Q*)-, one has 

I{p) = b¥.p\uG{o)r + \^pY. \^g{o,vT, (7) 

v&Vc\o 
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where Ep denotes expectation w.r.t. /), the law of the equivalence class of a connected rooted 
network (G, o), o denoting the root; ujg{o) denotes the weight of the loop at the root, and 
ug{o,v) denotes the weight of the edge (o, f) if v is an element of the vertex set Vq of the 
network. We refer to Proposition 13.91 for the precise result. 

It turns out that the choice of a "myopic" topology on V{Q*) is crucial to have the desired 
result. On the other hand we want this topology to be fine enough to have that the map p fip 
defining the "spectral measure" associated to p is continuous. If all this is satisfied, then a LDP 
for the spectral measure pc = f^p„ can be obtained by contraction from the LDP for see 
Proposition 13 . 1 3l In particular, we find that the function in Theorem 11.11 is given by 

<^{i^) = inf{/(p) , pGVsiG*): Pp = v]. (8) 

We turn to more explicit characterizations of the rate function in Theorem 11.11 First, the 
rate function $ depends on the laws of Xn and X12 only through a, a, h and the supports 
of the associated measures on While the variational principle dS]) is not always explicitly 
solvable, there is a large class oi v & 'P(M) for which ^{v) can be computed. This allows us 
to give explicit expressions for the rate function J{p) in Theorem 11.11 Recall that the free 
convolution with psc is injective: for any p G 'P(M) there is at most one v G 'P(M) such that 
P = Psc'^^- Let Psym(I^) denote the set of symmetric probability measures on M. li p = psc'^v, 
then p G 7^sym(R) is equivalent to G Psym(IK)- For more details on free convolution with the 
semi-circular distribution, we refer to Biane [7]. For u G 'P(M) we use the notation 

^a{v) = j \x\"dv{x) (9) 

for the a-th moment of v. If Xn G Sa{b) for some b < 00, then we write t?b for the associated 
measure on {—1, 1}. The following theorem summarizes the main facts we can establish about 
the rate function. 

Theorem 1.2. a) For any v G ViM), 

h) //supp(i?fe) = {-1, 1}, then for any v G 

<i>(z^) ^ hma(v). 
c) //supp('/9fe) = {—1, 1}, and v G 'Psym(IK), then 

^(^^) = A6)m«(i/). 

Some remarks about Theorem II. 2i Part a) shows clearly that J is a good rate function 
and that = is equivalent to /i = psc- Concerning the remaining statements, the fact 

that the moments maiy^ appear naturally in the rate function and the special role played by 
symmetric measures v can be understood as follows. Let D denote the diagonal matrix with 
entries Xn, . . . ,X„„ and, for n even, let A denote the block diagonal matrix with 2x2 blocks 
defined by A2i-\.2i = -'fj.j+i, ^2i,2j-i = i = 1, . . . ,n/2, and with Ai,j = for ah other 

entries. Then it is straightforward to see that the empirical spectral measures of D/^/n and 
Aj \/n are given by 



5 



Our results will show in particular that if the variables Xij are as in Assumption [U and 
supp(i?{,) = { — 1, 1}, then: 

1) fJ'D/^^ satisfies a LDP on 'P(M) with speed n^+°'/'^ and rate function /(z/) = hma{v), for all 
V G P(R) ; 

2) iij^i^ satisfies a LDP on P(M) with speed n^+°/^ and rate function equal to l{y) = | ma(i/), 
for all u G Psym(M), and = +oo \iv ^ ^sym( 



The statements above can be seen as extremal instances of Sanov's theorem for variables with 
exponential tails of the form Thus, roughly speaking, part b) in Theorem 11.21 savs that for 
^x/^ it is always possible to realize a deviation ^sc ffl by tilting diagonal entries only, i.e. 
using the deviation v for fJ-u/^,- When b ^ a/2, this is sharp, and indeed part a) and part b) 
above yield the expression $(1^) = bma{v) in this case. Similarly, to illustrate part c), observe 
that if 1/ G 7^sym(M), then the deviation /i^c H ^ can be always achieved by tilting either the 
diagonal or the off-diagonal entries, i.e. using either /x/j/^ or /^^/v^- This reasoning produces 
the bound ^{y) ^ {a/2 f\b)ma{v). The general bound in part a) then shows that this is actually 
the best strategy. 

If the support of "i^b is only {+1} (or {—1}) then the above scenario changes in that one can 
use the diagonal matrix D only to reach deviations u whose support is M+ (or M_). In this case 
we have the following estimates. Without loss of generality, we restrict to supp('!9f,) = {+!}. 

Theorem 1.3. Suppose supp('i?b) = {+!}• 

a) //supp(z^) C M+, then 

b) Suppose a G (1,2). If v £ 7^sym(l^), then 

c) Suppose a G (1,2). If J xdv[x) < then ^{i^) = +00. 

The above result can be interpreted as before by appealing to the large deviations of fi^ 
and liAl^fn- particular, part b) shows that since one cannot realize a symmetric deviation 
V G Psym(I^) using the matrix D only, it is less costly to realize it using the matrix A only. 
Similarly, in part c) one has that neither D nor nor any other matrix with vanishing trace, 
can be used to produce a measure v with J xdv{x) < 0, and therefore the rate function must 
be +00. We believe that results in parts b) and c) above should hold without the additional 
condition a G (1,2). 

The proofs of Theorems 11.21 and 11.31 are given in Subsection 13.101 

2. Exponential equivalences 

Throughout the rest of the paper, we fix the cutoff sequence e{n) as 

= (10) 
logn 

For ease of notation, we often write simply e in place of e(n). We decompose the matrix X as 

= A + B + C + D, (11) 
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where the matrices A,B,C,D are defined by 

^ij = l|X,,|<(logn)2/"^ Bij = l(logn)2/-^|X,,Keni/2^ 

We define the distance on V(M.) as 

d{fi, u) = sup{\g^{z) - g^{z)\ : Jm(z) ^ 2}, (12) 
where is the Cauchy-Stieltjes transform of /x, i.e. for z £ C+ = {z £ C : 3m{z) > 0}, 

= / (13) 

J X- z 

Recall that this distance is a metric for the weak convergence, see e.g. [3l Theorem 2.4.4]. Let also 
(Iks denote the Kolmogorov-Smirnov distance and let Wi denote the L^-Wasserstein distance, 
see Section [B] below for the relevant definitions. From (|72p and (j74p one has 

d{ii,v) i^dxsil^^v) f\Wi{^x,u). (14) 

The following proposition is the first major step on the way to prove Theorem I l.li 

Proposition 2.1. The random probability measures S cmd fJ-x/^ ^'^^ exponentially 
equivalent: for any 5 > 0, 

limsup logPfa^x/^,/i^c ffl ^ic) ^ 5) = -00. 



n— >oo n 



2.1. Preliminary estimates. The strategy of proof of Proposition 12.11 is in 3 steps: we start 
by showing that the contribution of D in (jlip can be neglected (Lemma 12. 2p . then we show that 
B can also be neglected (Lemma 12. 3p . The main step will then consist in proving that /^a+c 
and Use ffl l^c are exponentially equivalent. 

Lemma 2.2 (Very large entries). The random probability measures ha+b+c md {^xj^ ^'^^ 
exponentially equivalent: for any 6 > 0, 

lim sup log P (d{nx/^, liA+B+c) ^ A = -00. 

n— 5-00 n ^ ' \ / 

Proof. From (|14p . it is sufficient to prove that for any 5 > 0, 

limsup I log¥(dKs{lJ'X/y^, l^A+B+c) >A= -00. 



Then, using the rank inequality Lemma IB. 11 it is sufficient to prove that 

limsup , logP(rank(Z)) ^ bn) = —00. 

However, the rank is bounded by the number of non-zeros entries of a matrix : 
P(rank(Z)) ^ 26n) ^ ^ ^ e^^n^/^) ^ 5n). 

The Bernoulli variables l(|Xjj| ^ e^^n^/^), 1 ^ i ^ j ^ n, are independent. Also, by assumption 
Q, their mean value pij = ¥{\Xij\ ^ e^^n^^"^) satisfies 



Pij ^P = e 



-ce~'*n"/2 
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for some c > 0. For our choice of e in (jlOp . p = o(l/n ). Hence it is sufficient to prove that 
limsup— ^logP( V U{\X,,\ ^ e-^n^l^) - pi^) ^ 5n) = -oo. 

Recall that from Bennett's inequality, if Wj, z = 1, . . . , m are independent Bernoulli (pj) variables, 
and h{x) = (x + 1) log(x + 1) — x, then one has 

m 
1=1 

with cT^ = —Pi)- In our case, for all n large enough, 

2 /I N ^ n(n+ l)p 

^ = 2^ Pii(i-Ki)^ — 2 — • 

Therefore, using h{x) ~ xlogx as x — )• oo, 

P( (i(|x,,|^e-V/2)-p,,.)^5n)^exp(-^2;^(^)) 

^ exp ( — cqu log (1/np)) , 

for some constant cq > depending on 5. Now, since n = o{p~^), we find that for some ci > 0, 
for all n large enough the last expression is upper bounded by 

exp ^^conlogp^ ^ exp ( — cin^'''"/^e~°) . 

This proves the claim. □ 
We now show that the contribution of B in (jlip is also negligible. 



Lemma 2.3 (Moderately large entries). The random probability measures iia+c o.nd ^x/y/n '^'^^ 
exponentially equivalent: for any (5 > 0, 

limsup ]_ \og¥(d{nx/^,l^A+c) ^ i^) = -oo. 



Proof. By Lemma 12.2 1 and the triangle inequality, it is sufficient to check that for any S > 0, 

1 

n— >oo 

where W2 ^ Wi is the L^-Wasserstein distance defined by (j73p . From Hoffman- Wielandt in- 
equality Lemma lB.21 it is sufficient to prove that for any 5 > 0, 



limsup ^^sHW2{lJ.A+B+c, fJ'A+c) ^ 5) = -00, 



limsup— ^-777 log Pf-tr(5^) ^ 6] = -00. 



We write 

-tr(52)^A V |X,,fl((logn)2/°^|X,,|^eni/2). 



n 



Thus, from Chernoff's bound, for any A > 0, 



e 



n-2A|Xy|2l((logn)2/-^|X,,|W/2) 
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To estimate the last expectation, we use the integration by part formula, for fi E P(M) and 

rb rb 

/ aix)dn{x) = g{a)n{[a,oo)) - g{b)fi{{b,oo)) + g' {x)n{[x,oo))dx. (16) 

J a J a 

Define the function 

f(x) = n-^Xx^ - ex". (17) 

Let ^ denote the law of \Xij\, and g{x) = e" By Assumption [H there exists a constant 

c > such that 

fi{[t,oo)) = ¥{\Xij\ ^ t) ^ exp(-cP), (18) 
for all t large enough. In particular, g{t)fi{[t, oo)) ^ e^^^'\ From (jl6p it follows that 

7,1/2 



E 



n-2A|Xy|2l((logn)2/"«:|Js:,,Kenl/2) 



/•en ' 

^ 1 + / 9ix)dKx) 

J(logn)2/a 



^l + e/{(iogn)^/'^) + ^ ^ax e/(^). (19) 

n xe[(logn)2/",£ni/2] 

We choose A = |ce"~^n^+"/^, with the constant c > given in (|18p . Simple computations show 
that /(x) reaches its maximum for x G [(log n)^/", en"*^/^] at x = (logn)^/'^, where it is equal to 

^ce"-2n"/2-i(iogn)^/" - c(logn)2. 

Using (jlOp . for n ^ ng this is smaller than — |(logn)^. Therefore, using 1 + x ^ e^, x ^ 0, one 
has that ([19]) is bounded by exp enough. It follows that 

^^logP(itr(i?2) ^25) <:-lc<^e-2 + n^-/2e-t('°s")'. 

The desired conclusion follows. □ 
For s > 0, we define the compact set for the weak topology 

Ks = {fie P(M) : j x'^dfi ^ s}. 

For a suitable choice of s, we now check that fic is in Ks with large probability. 
Lemma 2.4 (Exponential tightness estimates). 

limsup I logF(^c' ^ -?^(iogn)2) = -oo. 

n— 5-00 ?^ ' 

Moreover, if I = ■ \Xij\ > (logn)^/"}, for any 6 > 0, 

lim — ^logP(|/| ^ 5n^+"/2^ ^ .^^^ 

n— s>oo n^'^' 

Proof. Notice that 



We may repeat the argument in the proof of Lemma [2.31 This time we take A = ^ce^ "n^"*""/^, 
where c is as in ()18p . and then define / as in ()17p . For any s > one has 



K2s) !^e-^'(l + e^("v^) + n°/2e-" max e^(^') 

V 2 xe[eni/2,e-ini/2] 



Simple considerations show that /(x), for x G [en^/^,e ^n-*^/^] is maximized at x = en^/^, where 
it satisfies f{ev}/'^) ^ — ^ce"n°/^. This gives, for n large enough, 

We choose finally s = l/(2e^). For our choice of e in (jlOp . this implies the first claim. 
For the second claim, we have 

The Bernoulli variables ^ (logn)^/"), 1 ^ z ^ j ^ n, are independent. Also, by Assump- 

tion [H their average pij = ¥{\Xij\ ^ (logn)^/") satisfies 

for some c > 0. We argue as in the proof of Lemma 12.21 From Bennett's inequality (|15p . 



^ (logn)2/") -p,,) ^ 5ni+"/2j ^ exp ( - con^+'^'Hog ^ 
for some constant cq = co(5) > 0. Since p = o{n°'^'^~^) , this gives the claim. □ 



2.2. Auxiliary estimates. To complete the proof of Proposition 12. H we shall need two extra 
results. The first is due to Guionnet and Zeitouni |1H corollary 1.4]. 

Theorem 2.5 (Concentration for matrices with bounded entries). Let k ^ 1, letY £ TiniC) be 
a random matrix with independent entries (^j)i^j^jsgn bounded by k, and let M G ^n(C) be a 
deterministic matrix such that J x'^dfiM ^ k'^- There exists a universal constant c > such that 
for all {cK^/nf/^ ^t^l, 

^{Wl[^^Y/^+M^^^^Y/^+M ^t)) ^ ^exp 

In [in corollary 1.4], the result is stated for matrices Y in TiniC) such that the entries have 
independent real and imaginary parts. The extension to our setting follows by using a version of 
Talagrand's concentration inequality for independent bounded variables in C. Also, the matrix 
M is not present in [TT]. It is however not hard to check that its presence does not change the 
argument in [TTl page 132], since one can use the bound 

j x^dfiy/y^+M ^ 2 y x'^dfiy/^ + 2 y x'^dfiM ^ 4fi;2. 
The latter is an easy consequence of e.g. Lemma |B.2[ 




The second result we need is a uniform bound on the rate of the convergence of the empirical 
spectral mesasure of sums of random matrices. 
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Theorem 2.6 (Uniform asymptotic freeness). Let Y = {Yij)i^i,j^n S HniC) be a Wigner 
random matrix with Var(Y'i2) = 1, E|Y'i2p < oo and ElYnp < oo. There exists a universal 
constant c > such that for any integer n ^ 1 and any M £ TiniC), 



y/E\V[^ + E\Y,2\^ 



n 



A striking point of the above theorem is that the constant c does not depend on M. The proof 
of Theorem 12.61 is given in Section lAl below. We are now ready to finish the proof of Proposition 



2.3. Proof of Proposition [^TTl By Lemma [2.21 and [2.31 it is sufficient to prove that fJ-A+c find 
fj.sc ffl fJ-c are exponentially equivalent: for any 5 > 0, 

lim -i— log F{d{fisc ffl /"c, f^A+c) >S) = -oo. (20) 

Let J- be the cj-algebra generated by the random variables 

{Xij : {i,j) such that \Xij\ ^ (logn)^/"}. 

Then C is J^- measurable and, given J^, Aisa random matrix with independent entries (^jj)i^i^jXn 
bounded by (logn)^/°. Define the event 

^= jy x^dfic ^ (logn)2|. 

Then E £ T. Lemma 12.41 implies that for some sequence si(n) — )• oo and all n large enough, 

P(S^) ^ e~"i(")"'^"^'. (21) 

Also, using ()14p and Theorem 12.51 applied to k = (logn)^V(logn)^/°, for some sequence S2{n) — t- 
oo, for all n large enough, 

l£;P^(d(E^^A+c, l^A+c) > 6/3) ^ e-'^^^^'"""" (22) 



where Pj- and Ejr are the conditional probability and expectation given J-. From ()2ip and ([22 
using the triangle inequality one has that ()20p follows once we prove that for any 5 > 0: 



lim -i— log nd{^lsc ffl /ic, E^/iA+c) ^ 5) = -oo. (23) 

We now use a coupling argument to remove the dependency between A and C. Let be 
the law of X12 conditioned on {|Xi2| < (logn)^/"}, and Qn be the law of Xn conditioned on 
{|Xii| < (logn)2/°}. We also define / = {(«, j) : \Xij\ ^ (logn)^/"}. Given F, if (z,j) G /, 
then j4jj = while, if {i,i) ^ / and 1 ^ i ^ j ^ n, then y/nAij has conditional law P„ or Qn 
depending on whether i < j 01 i = j. 

On our probability space, we now consider Y an independent hermitian random matrix such 
that {Yij)i<^i<^j<^n are independent, and for 1 ^ i ^ n, Ya has law Qn, while for 1 ^ i < j ^ n, 
Yij has law Pn- We form the matrix 

^',, = l((.,j)^I)^.,+l((z,j)Gl)^. 
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By construction, ^/nA' and Y have the same distribution and are independent of J-. Also, by 
Lemma IB.21 and Jensen's inequahty. 



E^d{flA+C,f^A'+c)^\I^T- ' 



n 



\ ^-^ V 

V l^ijXn 

where we have used the fact that, for some constant cq > 0, 

max(E|yn|2,E|yi2|2) ^c2. 

Define the the event 

F= {\I\ ^ d'^n^/cl}. 

Then F G J" and 

Ip^TdiflA+CfJ'A'+c) ^ S. (24) 
From Lemma 12.41 for some sequence S3(n) — )• oo, for all n large enough, 

P(F^) ^ e-^3(")"'^"^'. (25) 
Observe that by definition of the distance (fT2|) . 

Since A' and Y/^/n have the same distribution, we deduce from ([2l|) . ([25]) and the triangle 
inequality that the proof of (p3]) can be reduced to the proof of 



^l+a/2 logIP'('^W ffl A^cEj-Aiy/^+c) > ^) = -OO. (26) 



Clearly, E|Y'i2p ^ co(logn)^/° and E|yi2p — s- 1. Hence ()26p follows immediately from the 
uniform estimate of Theorem 12.6] applied to M = C, which is ^-"-measurable. Indeed, Theorem 
2.61 implies that for 5 > 0, 



for all n ^ ?^o(<^) where no (5) is a constant depending only on 6. This concludes the proof of 
Proposition 12.11 



3. Large deviations of very sparse rooted networks 

In this section, we start by adapting to our setting the notion of local weak convergence of 
rooted networks, introduced in [6], [2], and [1]. Next, we introduce a suitable projective limit 
topology on the space of networks. Then we prove the LDP for the network G„ induced by the 
very sparse matrix C. Finally, we introduce the spectral measure associated to a network and 
project the LDP for networks onto a LDP for spectral measures. 
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3.1. Locally finite hermitian networks. Let F be a countable set, the vertex set. A pair 
(n, v) G V'^ is an oriented edge. A network or weighted graph G = {V, lu) is a vertex set V 
together with a map u from V"^ to C. We say that a network is hermitian, if for aU {u,v) G V'^, 

uj{u, v) = uj{v, u). 

For ease of notation, we sometimes set (jj{v) = uj{v, v) for the weight of the loop at v. The degree 
of in G is defined by 

uev 

The network G is locally finite if for any vertex v, deg{v) < oo. 

A path TT from n to -u in y is a sequence tt = (uq, • • • ,Uk) with uq = u, = v and, for 
1 ^ z ^ A;, |a;('Ui_i, Uj)! > 0. If such tt : u ^ v exists, then one defines the £2 distance 

^ 1 /2 

D^{u,v) = |a;(«i_i,Uj)|"^j . 
1=1 

The distance between u and v is defined as 

D{u,v) = inf Dt^{u,v). 

tt: u~¥v 

Notice that weights are thought of as inverse of distances. If there is no path tt : « — > then the 
distance D{u, v) is set to be infinite. A network is connected if D{u, v) < 00 for any v ^V. 

AU networks we consider below will be hermitian and locally finite, but not necessarily con- 
nected. We call Q the set of all such networks. For a network G G ^, to avoid possible confusion, 
we will often denote by Vg, ojg, degc the corresponding vertex set, weight and degree functions. 

Clearly, any n x n hermitian matrix G T-Ln{^) defines a finite network G = G{Hn) in a 
natural way, by taking 

VG = {l,...,n}, u;G{i,j)=Hnii,j). (27) 
For simplicity, we often write simply ff„ instead of G{Hn)- 



3.2. Rooted networks. Below, a rooted network {G,o) = (V,uj,o) is a hermitian, locally finite 
and connected network (V, a;) with a distinguished vertex o gV, the root. For t > 0, we denote 

by {G,o)t the rooted network with vertex set {u £ V : D{o,u) ^ t}, and with the weights 
induced by oj. Two rooted networks {Gi,Oi) = (Vi,ajj,Oj), i G {1,2}, are isomorphic if there 
exists a bijection cr : Vi — V2 such that (t(oi) = 02 and (t{Gi) = G2, where a acts on Gi through 
a{u, v) = {a{u),a{v)) and cr{uj) = u o a. 

We define the semi-distance di^c between two rooted networks (Gi,oi) and (G2,02) to be 

rfloc((Gl,Oi), (G2,02)) = ' 

where T is the supremum of those t > such that there is a bijection a : V(g^ o^j^ — >■ V^G2,02)t 
with (t(oi) = 02 and such that the function ug^ — o o" is bounded by 1/t on V^^^ 

The rooted network isomorphism defines a space Q^.. of equivalence classes of rooted networks 
(G,o). On the space Q^, dioc is a proper distance. The associated topology will be referred to as 
the local topology. We write g for an element of Q^. We shall denote the convergence on (^*, dioc) 

by diocign, 5) ^ or g. 
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The space {G*,dioc) is separable and complete |Tj. Let 1^(0^) denote the space of probability 

measures on t/*. For fi, fin G ^(^*), we write fin'~^ fi when /i„ converges weakly, i.e. when 
/ fdfin — ^ / fdfi for every bounded continuous function / on (^*,iiioc)- This notion of weak 
convergence is often referred to as local weak convergence. See [T] for more details and examples. 

For a network G £ G, and v £ Vq, one writes G{v) for the connected component of G at v, 
i.e. the largest connected network G' C G with v G Vc- If G E ^ is finite, i.e. Vc is finite, one 
defines the probability measure U{G) G V{G*) as the law of the equivalence class of the rooted 
network (G(o), o) where the root a is sampled uniformly at random from Vq- 

v&Va 

where g{v) stands for the equivalence class of {G{v),v). If G„,n ^ 1, is a sequence of finite 
networks from Q, we shall say that G„ has local weak limit p G V{Q^) if U{Gn) P- 

3.3. Sofic measures. Following [T], a measure p G V{Q*) is called sofic if there exists a sequence 
of finite networks Gn,n ^ 1, whose local weak limit is p. All sofic measures are unimodular, 
the converse is open; see [1]. We shall need to identify a subset of these measures. Let '&a,'&b 
denote the laws of X12/IX12I and respectively, for X12 G Sa{a) and Xu £ Sa{b), see 

Assumption [U and let Sa,Sh C S"^ denote their supports. Let An C 'Hn(C) be the set of n x n 
hermitian matrices H such that either Hij = or Hij/\Hij\ G Sa for all i < j, and such that 
either Hu = or Ha /{Hal G Sj, for all i. We say that p G V{G*) is admissible sofic if there 

exists a sequence of matrices Hn G An such that U{Hn) --^ p, where Hn is identified with the 
associated network G{Hn) as in ([27|l . We denote by Vs{Q*) the set of admissible sofic probability 
measures. Measures in Vs{Q*) will often be called simply sofic if no confusion can arise. 

Let 50 stand for the trivial network consisting of a single isolated vertex (the root) with zero 
weights. We refer to (70 as the empty network. Clearly, the Dirac mass at the empty network 
p = 5g^ is sofic (it suffices to consider matrices with zero entries). Let us consider some more 
examples. 

Example 3.1. Suppose that S^, = {— Let Yi,Y2,... be i.i.d. random variables with 
distribution 1/ G 'P(M). Consider the random diagonal matrix Hn with Hn{i,i) = Yi. Then, by 

the law of large numbers, almost surely U{Hn) --^ p, where p is given by 

P= SgJ'^i.x), 
Jr 

if gx is the network consisting of a single vertex (the root) with loop weight equal to x. 

Example 3.2. Suppose that Zi,Z3,Z^ . . . are i.i.d. complex random variables with law p G 
■p(C) such that p-a.s. one has either Zi = 0, or Zi/\Zi\ G Sa- Consider the n x n matrix H 
such that Hn{j,j + 1) = Zj, Hn{j + 1, j) = Zj, for all odd 1 ^ j ^ n — 1, and all other entries of 
Hn are zero. By construction, Hn G An almost surely. From the law of large numbers, almost 

surely U (Hn) p, where p is given by 

P= \ j^{^9.+^h)dp{z)^ 

if (jz denotes the the equivalence class of the two vertex network (y, w,o), with V = {o, 1}, 
a;(o, 1) = z, uj{l, 0) = z and oj{o, o) = a;(l, 1) = 0. 
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Example 3.3. For any fixed n S N, if Hn G An, then U{Hn) G VsiG*)- Indeed, take a sequence 
of m X m matrices Am G Am defined as follows. Let k,r ^ 0, with r < n, be integers such 
that m = kn + r, and take Am as the block diagonal matrix with the first k blocks all equal to 
Hn and the last block of size r equal to zero. Then U{Am) = n+(r/k) ^i^n) + i+(fcn/r) ■ 
m — )• oo, r/k — )• 0, kn/r — )• oo, and therefore U{Am) converges to U{Hn)- 

3.4. Truncated networks. It will be important to work with suitable truncations of the 
weights. To this end we consider, for < < 1, networks G such that for any {u,v) S Vq, 

degciv) ^ 0-\ and \ug(,u,v)\ 9 1{ujg{u,v) / 0). (28) 

We cal the set of all such networks. Clearly, any G G is locally finite and has at most 
outgoing nonzero edges from any vertex. As before, one defines the space by taking 
equivalence classes of connected rooted networks from Q^. We define V{Ql) as the sets of 
p G V{G*) with support in Q^, and set Vs{Qt) = ^{Ql) n Vs{Q*)- The following lemma follows 
from routine diagonal extraction arguments. 

Lemma 3.4. (i) For any 9 > 0, is a compact set for the local topology, 
(a) Vs{G*) is closed for the local weak topology. 

Next, we describe a canonical way to obtain a network in by truncating a network from 
Q. For < < 1, define the two continuous functions 

r if x G [0,0) r 1 if X G [0,0-2 _ i) 

Xe{x) = { {x-9)/9 if xG[0,20) Xe(x) = < 9-^ - x if x G [0-^ - 1, 0-2) 

if xG[20,oo) (o if xG[0-2,oo) 

that will serve as approximations for the indicator functions l(x ^ 0) and l(x ^ 0-2). 

If G = (V^, w), we define Gq = {V,u)q) as the network with vertex set V and, for all u,v ^V, 

u)e{u,v) = w('u,t;)xe(degG(u) V degG(v)). (29) 
Next, we define Gq = {V^uq) as the network with vertex set V and, for all n, w G 

ujeiu,v) =i^eiu,v)xe{\^^e{u,v)\)- (30) 

Clearly, Gq satisfies ([28]) . and for any u,v £V, 

degGe(^i) ^ degG(u) and \ujGe{u,v)\ ^ \uJGiu,v)\. (31) 

If 5 G ^=i< and the network (G, o) is in the equivalence class g, then ge G is defined as the 
equivalence class of {Ge{o),o), where Gg is defined by pO]l . This defines a map g ^ ge from 
io Q^. If /? G V{G*) and g has law p, the law of gg defines a new measure pg G V{Ql). 

The next lemma follows easily from the continuity of xe^Xe and the fact that as — t- 0, for 
any for x > 0, Xb{x) 1 and Xe{x) 1. 

Lemma 3.5 (Continuity of projections). 

i) for > 0, the map g ^ gg from Q^, — )• is continuous for the local topology ; 
a) for > 0, the map p ^ pg from V{G*) to V{QI) is continuous for the local weak topology ; 

Hi) as — )• 0, one has gg^^ g and pg ^ p, for any g ^ and p G 'P(^*). 
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3.5. Projective topology for locally finite rooted networks. In order to circumvent the 
lack of compacity of Vs{G*) w.r.t. local weak topology, we now introduce a new topology, the 
projective topology. For integers j ^ 1, set 

9, = 2-^. 

Let pj : — Q^^ be defined by Pj{g) = goy Similarly, for 1 ^ i ^ j, pij : Q^^ — Q^^ is the map 

6 ■ 

Pijid) = 99i: 9 ^ G* ■ The collection {pij)i<^i<^j is a projective system in the sense that for any 
1 ^ i ^ J O, 

Pik=PijoPjk- (32) 
The latter follows from 26jj^i ^ 6j and ^ ^J+i ~ ^' 

Define the projective space Q^, C Hj^i G* the set of y = (yi, 2/2, • • • ) ^ Hj^si such that 
for any i ^ j, Pijiyj) = yf, see e.g. [lOl Appendix B] for more details on projective spaces. One 
can identify G^: and t/,,: 

Lemma 3.6. The map i{g) = {pj{g))j^i from to is bijective. 

Proof. The fact that l is injective is a consequence of Lemma 13.51 part {iii). It remains to 
prove that the map l is surjective. Let y = (yj) S G^. One can represent the y^-'s by rooted 
networks {Gj,o) = {Vj,ujj,o) such that Vj C V^+i- Set V := Uj^iVj. By adding isolated points, 
one can view {Gj,o) as the connected component at the root of the network Gj = {V,ujj), 
where ujj{u,v) = whenever either n or u (or both) belong to V \Vj. Moreover, one has that 
Gi = {Gj)0. for all i < j. This sequence of networks is monotone in the sense of ()3ip . 

For fixed u,v £ V, and j E N, if ujj{u, v) 7^ then the degree of u and v is bounded by 2^-' in 
any network Gfe, j and therefore ujk{u,v) = ujj+i{u,v) for all A; ^ j + 1. In particular, for 
all u,v G V the limit 

uj{u,v) = lim ujj{u,v) 

j-5>00 

exists and is finite. The same argument shows that for any u G V, limj_j.oo deg^ _ (n) exists and 
equals 

\uj{u, v)\'^ < CO. 

To prove surjectivity of the map i, it suffices to take the network G = (y,uj), and observe that 
it satisfies Gg. = Gj for all j G N. □ 

With a slight abuse of notation, we will from now on write t/* in place of t/*. The projective 
topology on is the topology induced by the metric 

c^proj (5, 5' ) = X] ^ '^1°'= (^^j ' ^'dj ) • 

The metric space (^*,(iproj) is complete and separable. Also, Qn g, i.e. dproj(fi'niS') ~^ 0, if 
and only if for any 6 > 0, {gn)e 9e- The projective weak topology is the weak topology on 
V{Q^) associated to continuous functions on (^*,(iproj)- We denote the associated convergence 

by Notice that p„ "^^^ p if and only if for any 9 > 0, {pn)e Pe- The topology generated 

proj 

by (iproj is coarser than the topology generated by dioc, and the weak topology associated to -w 
is coarser than the weak topology associated to 
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Example 3.7. Consider the star shaped rooted network (G„,l) = (V„,a;„,l) where Vn = 
{!,••• , n}, with uJn{u, v) = ujn{v, n) = 1, if n = 1 and v ^ 1, and uj{u, v) = otherwise. Let gn 
denote the associated equivalence class in G^:. Then gn does not converge in {G^,dioc) because 

of the diverging degree at the root. However, in (^*,dproj), dn ^—^^ gi$ where gijj is the empty 

network. Moreover, U{Gn) does not converge in V{G*) for ^ however U{Gn) Sg^. 

Lemma 3.8. (i) is compact for the projective topology, 
(a) Vs {G* ) is compact for the projective weak topology. 

Proof. Statement (ji]) is a consequence of Tychonoff theorem and Lemma I3.4l| i|) . It implies that 
is compact for projective weak topology. Hence, to prove statement dn]), it is sufficient 

to check that Vs{Q*) is closed. Assume that p„ G VsiQ*) and pn P- Then for any > 0, 
{Pn)e £ 'PsiQ*) and {pn)e Pe- By Lemma lOt jiil). we deduce that pe G Vs{Q*)- However, 
as — )• 0, using Lemma 13.51 we find pg -w p. By appealing to Lemma I3.4l|ii|l again we get 
peVsiG*). □ 

3.6. Large deviations for the network G„. For a rooted network (G, o), G = {Vg,ujg), 
define the functions 

i^iG,o) = \ujG{oT and </>(G,o) = ^ \u;g{o,vT. (33) 

v£Vg\o 

Since these functions are invariant under rooted isomorphisms one can take them as functions 
on Then, if p G V{G*) we write ^ptp, and KpCp to denote the corresponding expectations. 
We remark that for any 6 > 0, the restriction of (j), ip to (^f , dioc) gives two bounded continuous 
functions. Therefore, as functions on (^/* , dproj ) , (j) and ip are lower semi-continuous. 

We now come back to the random matrix C = C{n) defined in (jlip . For integer n ^ 1, 
consider the associated network 

Gn = (K,Wn) , with = {1, • • • ,n} and a;„(i, j) = Cij. (34) 

From the first Borel Cantelli lemma, almost surely the matrix G has no nonzero entry for n large 

enough. Therefore, almost surely, U{Gn) Sg^, the Dirac mass at the empty network. The 
next proposition gives the large deviation principle for C/(G„) for the projective weak topology. 

Proposition 3.9. U{Gn) satisfies an LDP on V{Qit) equipped with the projective weak topology, 
with speed n^+'^l'^ and good rate function I : V{G*) [0,oo] defined by 

^^^^^(bEpiP + a]Kp<P zf p^Vs{Q.) ^3^^ 
\+oo ifp^VsiG*) 

If a or b is equal to oo, the above formula holds with the convention cxd x = 0. 
Proof. For ease of notation, we define the random probability measure 

Pn = U{Gn). 

By construction, /?„ G Vs{G*), see Example 13.31 and therefore it is sufficient to establish the 
LDP on the space VsiG*) with good rate function I{p) = bKpip + aE,p(p, p G Vs{G*)- 

Let i?proj(/5, 5) (resp. Biodp, S)) denote the closed ball with radius 5 > and center p G Vs{G*) 
for the Levy metric associated to the projective weak topology (resp. local weak topology). 
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Upper bound. By Lemma [3T8l|Ii|) . Vs{Q*) is compact. Hence it is sufficient to prove that for any 

lim sup Hm sup , log ¥{pn £ 5pi.oj(p,5)) ^ -b^^pifj - a¥.p4>. (36) 

Assume first that Ep^/^ and Epcji are finite. From standard properties of weak convergence, and 
the fact that are lower semi-continuous on (^*,dproj), it follows that the maps p i— )• EpV' 
and p I— )• Ep</) are lower semi-continuous on Vs{G*) w.r.t. the projective weak topology. Hence, 
we have for some continuous function h{5) with /i(0) = 0, 

npn G 5proj(/o,<5)) ^ F(Ep> > EpV' - ; lEp> ^ Ep,/. - h{5)). 

By definition, 

1 " 

1=1 

and 

are independent random variables. Therefore, 

G 5proj(p,<5)) (37) 

^ P(Ep„V > EpV' - /i(<5)) P(Ep„(/. ^ Ep</. - /i((5)). 

To prove the part of the bound involving (j), one may assume ¥.p<j) > 0. Take 5 small enough, so 
that s := Ep(/) — h{5) > 0. From Chernoff 's bound, for any < oi < a, 

By assumption, there exists 02 G (ai, a), such that for all t > large enough, 

P(|^i2| >t)^ exp(-a2r). 

Using (fT6]) . one deduces that 

Eexp (ai|Xi2ri,^^|^^^|^,_i^) ^ 1 + e-(-2-«i)-"""' + aai / x-ie-('^^-'^i)-"(ix 

^ ^ ./ e^fn 

^ 1 _^ _^;2_^_(a2-ai)e"n"/2_ 

^ 02 — ai 

Therefore, 

P(Ep> ^ s) ^ exp f - ainl+"/2s + — ^2g-(a.-ai).-„'^/2 

V 2(a2 - ai) 

We have thus proved that 

1 

n— >oo fl^ 

Since the above inequality is true for any a\ < a, it also holds for oi = a. Similarly, one has 



hmsup -—^^ logP(Ep„(/> ^ s) ^ -ai(Ep0 - h(b)). 



limsup— ^logP(Ep> ^ s) ^ -6(EpV- /i(<5)). 



From ()37p . it follows that (|36p holds under the assumption that both Ep^,Ep</> are finite. How- 
ever, if either Ep^/^ or Ep0 is infinite, a straightforward adaptation of the above argument shows 
that the left hand side of (1361) is —00. □ 
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Lower bound. It is sufficient to prove that for any p G Vs{G*) and any 6 > 0, 

limsup — ^ logP(p,, G 5proj(/0, S)) ^ -6EpV - aEp0. (38) 

In order to prove ([55]) . we may assume without loss of generaUty that I{p) = bEp^p+oEpCp < oo. 
By monotonicity one has that 

hm I{pe^) = I{p). 

-I— r 6 ■ 

Therefore, since the projective topology is generated from the product topology on nj>i > 
it is sufficient to prove ([38]) for all p G Vs{Qt)j for all < < 1. Finally, since the local weak 
topology is finer than the projective weak topology, it is enough to prove that for any < < 1, 
P G VsiQl) and (5 > 0, 

limsup — ^ logP(p„ G Aoc(p, <5)) ^ -felEpV' - aIEp0. (39) 

Let us start with some simple consequences of Assumption [TJ From ([3]), there exists a positive 
sequence converging to such that, for any s ^ e(n) = 1/logn, 

g-(a+r,„)s«n"/2 ^ p(|Xi2| ^ S^/^) ^ g-^'^"''")""""^' . (40) 

In particular, if s ^ then for any 7 > 0, for all n large enough, 

IP(|^i2| G [s,5 + 7)^/^) ^ le-('^+''")^"""''. 

Therefore, using ([5]), one finds that there exists a sequence a„ — t- a such that for every 7 > 0, 
for all n large enough, for every z G C, with \z\ ^ e(n), 2:/|2:| G Sa, 

P(^i2/V^ G i?c(^,7)) ^ e-""l^l"""'', (41) 

where S'a denotes the compact support of the measure G 'P(S^) associated to X12, and 
-Sc(-Z)7) is the euclidean ball in C, with center z and radius 7 > 0. 

Similarly, there exists a sequence 6„ — t- 6 such that for every 7 > 0, for all n large enough, for 
every x G M, with \x\ ^ e(n), x/\x\ G Sb, 

F(Xn/\/^ G i?M(x, 7)) ^ e-^"l^l"""^'. (42) 

Since p G Vs{Ql), there exists a sequence of matrices G such that the associated 

network as in (j27|) is in and such that U{Hn) p. In particular, for n sufficiently large one 
has 

U{Hn)eB,oc{p,S/2). 

From Lemma l3.1(H there exists ^ = ^(6. 0) > such that if |a;G„(i)—-ffn(«5 ^)| ^ 7 and |u;g„(?, j) — 
Hnihj)\ ^ 7 for all 1 ^ i ^ J ^ n, then />„ = U{Gn) G Bioc{U{Hn), 5/2). Then, by the triangle 
inequality, for all n large enough, 

G Sloc(p,5)) ^ G Aoc(f/(^„), V2)) 



'( max \uG„{i) - Hn{i,i)\ , max Lg„(^, j) - -ffn(«, j)| ^ T)- 



Independence of the weights oJCnihi) = Cij, 1 ^ i ^ j ^ n then gives 

n 

P(PnGi?loc(p,5)) ^n^(l^--^"(^'^)l n n\C^J-H4i,j)\^J). 

i=l l^j<j^n 
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Notice that whenever Hn{i-,j) / one has \Hn{i-,j)\ ^ 9, and thus using (jiOjl and (jl2]) one has 
for all i = 1, . . . , n: 

P(|ai - i)| ^ 7) ^ i)| > 0) + (1 - e--'^'^"'>(|/74 

where the constant c satisfies c ^ 6/2 > 0. Similarly, using ()4ip . for all i ^ j and for some 
c ^ a/2 > 0: 

Observe that 

Summarizing, using (1 — g-ce"""''^^"^ ^1/2 for n large enough, one finds 

Since and (j) are continuous and bounded on Q^, one has IKuf^fj^^il; — )• Ep-i/; and E^(j|^^')(/) — )• Ep0, 
as n — )• cx). Moreover, 0,1 — )• a and bn — )■ 6- Therefore, implies the desired bound ([H^ . This 
concludes the proof of the lower bound. □ 

The next lemma was used in the proof of the lower bound of Proposition 13.91 While the 
estimate is somewhat rough, it is crucial that it is uniform in the cardinality n of the vertex set. 

Lemma 3.10. Let < 6 < 1 and 6 > 0. There exists 7 = 'y{5,9) > such that for any integer 
n ^ 1, for any networks G G G, H G Qq with common vertex set ^ = {1, • • • , n} such that 

max \ujg{'U',v) - loh{u,v)\ f^-^, (44) 



then 



max d\ociiG{u), u),{H{u),u)) ^(5. (45) 



In particular, 

U{G)GBi,,{UiH),5). 

Proof. Each edge of H has a weight bounded by 9~^. This implies that in H each path whose 
total length is bounded by t > 0, contains at most t'^/9'^ edges. Moreover, H has at most 6*"^ 
outgoing edges from any vertex. Hence, H has at most m = 9~^^ 1^ vertices at distance less 
than t from any given vertex. Fix the root u gV and t > 0. From the pigeonhole principle, 
there exists to > such that tjl < to < t, and an interval I = [to — t/(8m),to + t/{8m)], such 
that there is no vertex within distance s £ I from u in H. 

If ei, • • • , efc are the edges on a path in H, then provided that < 7 < 9/2, 



k 



i=l i=l 



2 



i=l 



where the first inequality follows from the joint convexity of [0, 00)^ B (x, y) 1— )■ {^/x — y^)^ 
and the second inequality follows from |a;j:/(ej)| ^ 9 and the assumption ()44p . In the worst 
possible case one can take k = t'^/9'^ for the number of edges at distance to from u. Together 
with the previous observation, this shows that if 27-v/fc/^^ ^ t/{8m), i.e. 7 ^ 6*^(16^7), then 
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the neighborhood of u consisting of vertices within distance to in G and in H have the same 
vertex set. From the definition of d\oc, this choice of 7 in (|44p imphes that 

dUiGiu),u),{H{u),u)) ^ \ ^ - 

i + 7 A ro * 

Thus, taking t = 2/5, one has (gSD, as soon as e.g. 7 ^ 6'V(16m) = 6^+^^'^^^^^^ /IQ. From the 
definition of the Levy distance, it immediately follows that U{G) G B\oc(U{H),6). □ 

3.7. Spectral measure. For a network G = {V,u]) G , we may define the bounded linear 
operator T on the Hilbert space i'^{V) by 

Tejj = ^ u;{u, v)eu, (46) 

for any v G V, where {cu, u G denotes the canonical orthonormal basis of l'^{V). T is 
bounded since 

\\Te,\\l = = deg(t;) ^ 9-\ (47) 

Also, since G is hermitian, T is self adjoint. We may thus define the spectral measure at vector 
Cy, as the unique probability measure //^ on M such that for any integer A; ^ 1, 



x'dfi^T = {ev,T''e,). (48) 

Notice that for rooted networks (G, o) with G G , then the associated spectral measure //^ is 
constant on the equivalence class of {G, o), so that fj,^ can be defined as a measurable map from 
to P(M). Thus, if p G 'P(^f), one can define the spectral measure of p as 

^^P = lEp^T- (49) 

In general, if p G ViG*), then ()39|) allows one to define the spectral measures /ip^, where the 
truncated network pg is defined as in Lemma [331 When p G VsiG*), it is possible to define a 
notion of spectral measure fip as the limit of as — )• 0. More precisely, for a rooted network 
(G,o), GeG, and for /3 > 0, let 

e/3(G,o)= ^ \iOG{o,v)f. 

vGVg 

Since is constant on the equivalence class of (G, o), it can be seen as a function on Q^. For 
/3 > 0, define 

7'.Ar(^*) = e ^s(^*) : < r}. 

Lemma 13.111 and Lemma 13.121 below are suitable extensions of analogous statements in [8l|9]. 
The first result allows one to define the spectral measure fip of any p G Vs,p,t{G*)- 

Lemma 3.11. Let 0</3<2, r>l and p G T's,i3,t{G*)- Then the weak limit 



exists in P(R). 



Proof. To prove the lemma we are going to show that the sequence /Upg , — )• 0, is Cauchy w.r.t. 
the metric ()12p . 
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By assumption, there exists a sequence G„ of networks on {!,••• ,n} such that pn p, 
where pn = U{Gn)- Call r„ the associated hermitian matrix. The empirical distribution of the 
eigenvalues of T„ satisfies, by the spectral theorem, 

n 

1 - 



^ n 1 " 



n ^ — ' ' ' n 

i=l u=l 



where /U^^ stands for the spectral measure at u; see (j^Hj) . 

The truncations {pn)e and satisfy (/On)6» by Lemma [531 ii) . Moreover for all 9 > 0, 

f^iPn)e ^ f^Pe- (51) 

To prove (j5ip . let denote the random bounded self-adjoint operator associated to pg via (j46p 
and let be the matrices associated to {pn)e- One can realize these operators on a common 

Hilbert space £'^{V). Since (/On)e Pe, from the Skorokhod representation theorem one can 
define a common probability space such that the associated networks converge locally almost 
surely, so that a.s. T^e^ — >• T^e^, in i'^{V), for any v £ V. This implies the strong resolvent 
convergence, see e.g. [131 Theorem VIII. 25(a)], and in particular that for any v G V, a.s. 

V V 

p,rpe ~^ Pj,g. 

Then (|5ip follows by applying this to v = o and taking expectation. 



Let T^,T^ be the matrices associated to {Gn)e and {Gn)e respectively, where {Gn)e is defined 
according to ([29]) . and {Gn)d according to ([30]) . From (fT^ . using the triangle inequality. Lemma 
IB. II and Lemma IB. 21 

d{fiTi>,f^Tj ^ ^rank(f„^-r„)+ (itr(T;^-T:„^)2''^' 



From the definition ()29p one has 
1 ~ 2 " 

-rank(r„^ - T,) ^ - ^l(degG„(x) ^ - 1) = 2Pp„(degG(o) ^ - l). 

i=l 

From ()30p one finds 

1 ~ 1 " 

-tr(r„^ - r„^)2 ^ - ^ |a;G„(^,i)|'l(|a;G„(^,J)| ^ 20)l(degG„(i) ^ ^"2) 
Ep„l(degG(o) ^OEl^G(o,^^)Pl(|^G(o,t;)| ^20). 



n n 



Letting n go to infinity, using firpg = p(^p^)g, and ([5T]) . one has d{prpe , prpg') — )• d{fipg, pp^,). 
Therefore, by the triangle inequality and the dominated convergence theorem, for any < 9' < 
9 < 1/V2, 

d{pp„Ppg,)^4¥p{degG{o);?9-y2) 

1 /2 

+ 2(Epl{degGio) 9-^)^\ojGio,v)\Hi\ojGio,v)\ ^20)) . 

V 

Notice that, for f3 E (0, 2) 

degG(o)^/2 = (^{ugMI^Y^' ^^\u;Gio,v)\^ = Cp{G,o), (52) 
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where we use that '}2\=i ^ { Yl\=i ^iY Oj ^ 0, r ^ 1 and /c G N. Moreover, 

Y,\^Gio,v)\Hi\ojGio,v)\i:e)i:e^-^CpiG,o). 

V 

Hence, from Markov's inequahty. 

By assumption Ep^^ is finite. Hence, the sequence is Cauchy. □ 

Lemma 3.12. For any (3 £ (0, 2), t > 0, the map p ^ fip from Vs,i3,t{Q*) to 'P(M) is continuous 
for the projective weak topology. 

Proof. For any ^ > 0, from (j53p . 

d{ixp„iip)^c{e^ + e^-i), (54) 

with a constant c = c(r) > 0. Hence from the triangle inequality, if p, p' € Vs,i3,t{Q*)-, 

d{pp, pp>) i^2c{e^ + 9^-^)+ d{ppg , /ip, ) . 

Consider a sequence p' such that p' ^-^^^ p. If p' ^-^'^ p then p'g pg and therefore, with the same 
argument used in the proof of ([5T]) above one finds 

We deduce that 

hmsupd(/ip,/ip/) ^ 2c((9^ + 6'^"2). 

p' p 

Since > is arbitrarily small, the statement of the lemma follows. □ 

3.8. Large deviations for the empirical spectral measure pc- We can apply the previous 
results to the empirical spectral measure pc^ where C = C{n) is the random matrix defined 
in (fTT|) . So far we have defined pp for every p G Uo<;9<2 Ut>i If P S Vs{Q*) but 
P ^ Uo</3<2 Ur>i ^s,/3,t(^*), then we set 

Pp = Jo- 
Proposition 3.13. The empirical spectral measures pc satisfy an LDP on ■p(IR) equipped with 
the weak topology, with speed n^+"/^ and good rate function $ given by 

^{u) = inf{/(p) , peVsiQ.): Pp = v], (55) 
where I{p) is the good rate function in Proposition \3.9l 

Proof. Recall that by ([50]) the network G„ in ([3l|) satisfies pn = U{Gn) and 

Mp„ = PC- 

Notice that if c = (f A 6), then 

I{p) ^ cEp^a. (56) 

Hence, by Lemma fS. 121 the map p ^ pp is continuous on the domain of I{p). It is thus possible 
to apply a contraction principle to get the LDP for pp^ from the LDP for To be more precise, 
if i? is a Borel set in V(R), we write for any r > 1, 
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We start with the lower bound. Assume that B is an open set. For each r > 0, by Lemma 
13.121 the function fr '■ P ^ fJ-p from Vs,a,T{Q*) — ^ 'P(M) is continuous. Hence f^^{B) is an open 
subset of Vs,a,T{G*)- By Proposition 13.91 it fohows that 



inf I{p) ^ hminf — ^ logP(/ip„ G B). 



Using one has 



inf /(p) ^ (-cr) V Hminf log P(;Up„ G 5). 

Letting r tend to infinity, we obtain the desired lower bound: 

- inf ^{v) ^ limsup— ^logP(/Xp„ G B). 

To prove the upper bound, assume that B is closed. By LemmaElEl fr^i^) a closed subset 
T's,a,T{G*)- Proposition 13.91 yields 

limsup — ^ logF(/j,p,,^ € B;pn€ Vs^a^riQ*)) ^ - inf /(/>), 

n-5>00 n 2 p&'Ps,a,T{Q*)--Pp<^B 

and 

limsup a logP(Pn i Vs,a,T{G*)) ^ "CT. 

We have checked that 



limsup— ^ log P(/^p„ G ^ 

ra-s>oo n 2 



(cr) A inf «>(^) 
t^€B 



Letting r tend to infinity, we obtain the upper desired bound. The function $ is a good rate 
function (see e.g. \W\ Theorem 4.2.1-(a)] or Lemma 13.141 below) . □ 

3.9. Proof of Theorem ll.lL Thanks to Proposition 12. H all we have to show is that is that the 
sequence of measures fisc S Pc satisfies a LDP in V{M.) with speed n^"*""/^, with the good rate 
function ^ defined in Proposition 13.131 Since the map v i— t- figc S is continuous in V(M), the 
above is an immediate consequence of Proposition 13.131 and the standard contraction principle. 
This ends the proof of Theorem 11.11 

3.10. On the rate function We turn to a proof of the properties of the rate function listed 
in Theorem 11.21 and Theorem 11.31 



Lemma 3.14. For any f3 G (0,2), r > 1, for any p G Vs,i3^t{Q*), one has 

j \xfdpp{x) ^ Ep^/3. (57) 

Proof. We use the following Schatten bound: for all < p ^ 2, 

" fc=i j=i 

for every hermitian matrix A G ^„(C). For a proof, see Zhan [141 proof of Theorem 3.32]. For 
P G Vs,i3,t{G*), there exists a sequence of matrices Hn such that pn = U{Hn) p. Let be 
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the hermitian matrix associated to {Hn)e, the truncated network. From (|58p and ()52p . one has 
for all 6* > 0: 



I \xfd^iTo{x)^Ep^ 



2 \ 2 



^Ep„(e-/^AeMG,o)). 



For ^ > the spectral measures firpe = /^(p„)g have compact support uniformly in n. Thus, 
letting n go to infinity, from (|5ip one has 



j \xfdf,p^{x)i^EpCp. (59) 

On the other hand, by definition of fip, see Lemma [3. Ill one has fipg fip, 6* — )■ 0, and therefore 

J |x|''(i^p(x) ^ liminf J \x\^dfipg{x). 

This proves the claim ()57p . □ 

Proof of Theorem (a). The proof is an immediate consequence of Lemma 13.141 Indeed, 

from [56] and the definition of it suffices to show that for any r > 1, for any p G 'Ps,a,T{Q*)-, 
one has 



^ EpC„. (60) 

This is the case a = /3 in ()57p . □ 

Proof of Theorem I j.i^l (h). For x G M, let Qx G Q* denote the network consisting of a single 
vertex o with weight uj{o,o) = x. If u £ 'P(M), let p G V{Q*) denote the law p = f^5g^di'{x). 
Notice that 

lEpCo = / \x\°'di'{x) = mail'). 



Thus, we can assume Ep^a < oo, otherwise there is nothing to prove. Since we assume supp(T?fe) = 
{ — 1, +1}, one has that p is admissible sofic, see Example 13. 11 and p G 'Ps,o,t(^*) for some r > 1. 
The spectral measure pp oi p, defined as in Lemma 13.111 is easily seen to he fip = i^. Then 
$(i/)^/(p) = 6Ep^„ = 6m„(z.). □ 

Proof of Theorem \1.2\ (c). Thanks to part (a) and part (b), all we need to prove is that 

Hv)<,'^ma{u), (61) 

for all symmetric probabilities v on M. 

For z G C, let cjz G Q^^ denote the equivalence class of the two vertex network {y,uj,o), with 
V = {o, 1}, uj{o, 1) = z, uj{l,o) = z and uj{o,o) = uj{l, 1) = 0. Fix some e**^ £ Sa = supp(i?a)5 
let T be a nonnegative random variable with some distribution /i+ on [0,oo), and let p G 'P(C) 
denote the law of Te*"^. The law 



is sofic, see Example 13.21 A simple computation shows that the spectral measure of p satisfies 
= fJ-sym, where psym denotes the symmetric probability on M such that 

f{x)dpsymix) = - I (/(x) + f{-x))dp+{x) 

for all bounded measurable /. 
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To prove ()6ip . let v G 'Psym(IK) and write for the law of |X| when X has law i/. Then 
V = ^sym and the associated p satisfies pp = v. Therefore 

^ I{p) = ^ ^ x'^dp+{x) = ^ nio^iy). 

□ 

Proof of Theorem \1.3\ (a). We proceed as in the proof of Theorem [L3] (b) . Here Sb = {+1} and 
thus the law p = J^5g^dv{x) that we used there is not necessarily admissible sofic. However, it 
is so if one assumes supp(i^) C M+. The rest of the argument applies with no modifications. □ 

For the remaining statements, we use the following observation. 
Lemma 3.15. If p ^ 'Ps,/5,t{G*) for some /? G (1,2), r > 1, then 

xdpp{x) = EpWclo) • (62) 



Proof. By definition of the spectral measure pp^, see (j48p . for every > one has 

xdppg{x) = EpgtJG(o) = EptJGe(o), 



where Gq is the truncation of G, see (pOj) . The weights ujceio) satisfy |a;Gg(o)| ^ |a;G(o)| and, 
since /3 > 1, Ep|a;G(o)| ^ {Kp^pY^^ < 't^^^ ■ Thus, by the dominated convergence theorem. 



lim / xdppg{x) =¥.pUJG{o). 



From (|59p . and the fact that /3 > 1, we know that the identity map x i— )• x is uniformly 
integrable for {ppg)e>o- Therefore, by definition of pp, see Lemma [3.111 the limit above also 
equals f^xdpp{x). □ 

Proof of Theorem \1.3\ (b). In view of the bound ([6T]) . it suffices to show that if p G VsiG*) with 
Pp = u, then 

1 1 \xrdpp{x)^I{p). (63) 

Thanks to ([56]) . one may assume that p £ 'Ps,a,T{G*) for some r > 1. Moreover, by ([56]) and 
()60p . we know that ()63p holds if 6 ^ a/2. If 6 < a/2 we proceed as follows. Since a > 1 here, 
we may apply Lemma l3.15| and obtain that 



= / X di'{x) = KpUJcio) , 
Jr 

where we use the symmetry assumption on i/. Since ^5 = {+1}, one has that ujg{o) ^ and 
therefore (^0(0) = p-a.s. In conclusion I{p) = oEpCp = |Ep^Q,, and the claim (j63]) follows from 

m. □ 



Proof of Theorem \1.3\ (c). Suppose that /(p) < 00. Then by ([56]) one has p £ T's,a,T{G*) for 
some T > 1. Since a > 1, Lemma [3.151 vields j^xdv{x) = ^pOJcio) which, together with the 
assumption x dv{x) < 0, implies 

EpL^G(o) < 0. 

However, Sh = {+1} implies that Epa;G'(o) ^ 0, a contradiction. Thus, I{p) = +00, for all 
p E VsiQ*) such that pp = v. □ 
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Appendix A. Uniform asymptotic freeness 



A.l. Proof of Theorem 12.61 Recall the definition ([13]) of the function : C+ i— )■ C+, for a 
given n G 'P(M). Theorem 12.61 is a consequence of the following result. 

Theorem A.l (Uniform bound in subordination formula). Let Y = (i^j)i^ij^n G TiniC) be 
a Wigner random matrix with Var(yi2) = 1, E|Yi2p < oo and E\Yii\'^ < oo. There exists a 
universal constant c > 0, such that for any integer n ^ 1, any M £ T-Ln{C), any z G C+, 
3xn{z) ^ 1, 

\9{z) - g^,,{z + g{z))\ ^ c . 

where g{z) =Kgf,^^^^^^{z). 

Theorem lA.ll is a small generalization of Pastur and Shcherbina fL2[ Theorem 18.3.1]. We 
postpone its proof to the next subsection. We first check that it implies Theorem 12.61 This is 
done by a standard contraction argument. For z £ C+, we define the C-|- — ?• C4. map, 

0. : /i^5mm(^ + ^)- (64) 
It is Lipschitz with constant l/3xn{z)'^. In particular, if 3m{z) ^ 2, (j)^ is a contraction with 
Lipschitz constant 1 /4. Now, it is well known that if /i = fiM ffl fJ'sc, we have for all z G C-|- the 
subordination formula, 

9fiiz) = g^^^{z + g^{z)) = (j),{g^{z)), 
see Biane |7j. In particular, if for some probability measure G V{M) and e ^ 0, 

\9u{z) -gf,j,,{z + g^{z))\ ^ e, 

then 

1 

3m{zy 
So that, if Jm(z) > 2, 

4 

|fi'/.(2) -5';.(2;)| ^ -e. 

Hence from the definition of the distance d{fj,, v) in p2p . we see that Theorem 12.61 is a corollary 
of Theorem lA.ll 

A. 2. Proof of Theorem lA.lt the Gaussian case. In this subsection, we assume that 

(1) G = (9^6(112), rym(yi2)) is a centered Gaussian vector in with covariance K G '^^2(1^)5 
tr(i^) = 1. 

(2) Yii is a centered Gaussian in M with variance 1. 

The proof is a variant of Pastur and Shcherbina [121 Lemma 2.2.3]. We first recall the 
Gaussian integration by part formula: for any continuously differentiable function F : 1— )■ M, 
with E||VF(G)||2 < 00, 

EF(G)G = K¥NF{G). (65) 

We identify HniC) with M" . Then, if <I> : ?^n(C) 1— )• C is a continuously differentiable function, 
we define Djk^{X) as the derivative with respect to D\t{Xjk), and for 1 ^ j 7^ A; ^ n, D'^f^^{X) 
as the derivative with respect to 3m{Xjk). 

Define the resolvent R{X) = {X — z)^^, z G C+. From the resolvent formula 

R{X + A)-R{X) = -R{X + A)AR{X), (66) 



\gt^{z) - 9u{z)\ ^ e + \(t)z{9fi{z)) - 4>z{9u(.z))\ < e + ,^ |g^(^) - 9,y{z)\. 
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valid for any matrix A G T-Ln{C), a standard computation shows that if 1 ^ j, A; ^ n, and 
1 ^ a 7^ 6 ^ n, then 

DabRjk = —{RjaRbk + RjbRak) and D'^f^Rjk = —l{RjaRbk — RjbRak), 

while if 1 ^ a ^ n, then 

■DaaRjk — RjaRak' 

Set X = Y/^/n + M, so that 



R = {Yj^ + M 



z] 



-1 



Using ()65p we get, for ^ a 7^ 6 ^ n, and all k: 

ERjkYab = [KiiD^fcflj-fe + KuD'^^Rjk + iK2lDabRjk + ii^22i^ab^ifc] 

= — [{Kii - K22 + iKi2 + iK2i)RjaRbk + {Ku + K22 - iKu + iK2i)RjbRak] 

= -^E{jR,aRbk + RjbRak), (67) 

where at the last line, we have used the symmetry of K and tr^K) = 1, together with the 
notation 

7 = i^ii - K22 + 2^2 = EyJ,. 

Notice that I7I ^ 1. Similarly, for a = b one has 

ERjkYaa = l=ERjaRak- 



Jn 
Next, set 

G{z) = {M-z)-\ 

Notice that in this case the dependency of G{z) on z is explicit in our notation. From the 
resolvent formula ()66p 

R = G(z) - -^RYG(z). 
Hence, for 1 ^ j, k ^ n, using ([67|) - ([68|) . 

ER,k = Giz)jk-^ V E[RjaYab]G{z)f,k 

= G{z)jk^- nR3aRba\G{z\k^- Y nRjbRaa]G{z)bk. 



n '—^ n 

l^ayt^b'^n l^a,b^n 



We set 



1 " 

9 = 9f.^/v^+M (^) = -Y^^^^ 9 = ^9, 9 = 9- E5, 

" a=l 

and consider the diagonal matrix D with Dj^ = lj=kRjk- We find 

ER = Giz) + E\gR]G(z) + -E[i?(i2^ - D)]G(z). 

n 

Multiplying on the right hand side by G{z)^^ = M — z and subtracting gR one has 

ER(M -z-g) = 1 + EgR + -ER(R^ - D). 

— n 

Multiplying on the right hand side by G{z + 5) 

ER = Giz + g) + EgRGiz + g) + -ERiR~^ -D)G{z + g). 

— in 
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Finally, multiplying by ^ and taking the trace, 

g = g^,, {z + g) + ^Egtr[RG{z + g)] + ^Etr[R{R^ - D)G{z + g)]- 

As a function of the entries of Y , g has Lipschitz constant 0(n~^Dim(z)~^). This fact can be 
seen e.g. as in [3, Lemma 2.3.1]. Since the entries of Y satisfy a Poincare inequality, a standard 
concentration bound implies 

E|g| = 0(n-^Jm(z)-2). 
Also, since |tr(Ai?)| ^ n|| A|| ||i?||, we find 



-txRG{z + g) 
n 



^ 3m{z) ^ and 



tiR{R' -D)G{z + g) ^ 2nJm(z)" 
This concludes the proof of Theorem lA.ll in the Gaussian case. 



A. 3. Proof of Theorem IA.lt the general case. Let Y_^j = Yij — EY12. Then Y_ — Y has 
rank at most 1. Hence by Lemma IB. 11 

Wy/^+aA^) - 9^.r^^+,A^)\ ^ 0{{n3m{z))-^), 

where we have used ()14p and the fact that f{x) = {x — z)~^ has a bounded variation norm 
of order Jm(z)~^. Also, we recall that the map (j)z defined by ()64p is Lipschitz with constant 
l/3xn{z)'^. Hence in order to prove Theorem lA.H we assume without loss of generality that the 
off-diagonal entries of the matrix are centered: E,Yi2 = 0. 

We now check that the diagonal entries of Y are negligible. Let Y' be the matrix obtained 
from Y by setting the diagonal equal to zero: Y-j = li^jYij. 

Lemma A. 2 (Diagonal entries are negligible). For z G C+j 3mz ^ 1, 



Proof. From ()74p . we find 



{3xnzY 

Then by Lemma IB. 21 using Jensen inequality. 



□ 



As a consequence of Lemma IA.21 we can assume without loss of generality that the diagonal 
entries of Y are independent centered Gaussian with variance 1. By Subsection IA.21 the con- 
clusion of Theorem lA.ll holds for the matrix Y whose off-diagonal entries are centered Gaussian 
random variables with covariance is where K is the covariance of y , and with diagonal entries 
centered Gaussian with variance 1. Therefore, since the map (^z defined by ()64p is Lipschitz, in 
order to prove Theorem lA.il it is sufficient to establish that 

^^,,(^) r- ^ ■ (69) 
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We may repeat verbatim the interpolation trick in Pastur and Shcherbina [12, Theorem 18.3.1]. 
Consider the random matrix Y, independent of Y, and for ^ t ^ 1, define the matrix 

Y{t) = VtY + VT^Y. 

Set R{t) = {Y{t)/^/n + M — zl)^^. Then, using the resolvent equation ([66|) 

I d 

1 

trR{t)Y'{t)R{t)dt 



2n3/2 

Next, consider the extension of (|65p to arbitrary centered random variable G with covariance K. 
Namely, for any twice continuously differentiable function F : i— )• M, with E||V-F(G)||2 < oo 
and sup3,gi[j2 ||HessF(a;)|| < oo, a Taylor expansion gives 



EF{G)G = KEVF{G) + 0{E\\G\\1 sup ||HessF(x)|| ) . 



Since Y and Y have the same first two moments, we get for all t G [0,1] 

^^^^ E \D'jkDURiX)\,l 



where c > is a constant, and Djj^Dj'j^ ranges over Djj^,D''^i^ and DjkD'-j^. However, it follows 
from (ISZD-dMD that 



\D%D%{R{Xf)^j\ 



4 



is a finite linear combination of products of 4 resolvent entries of the form ni=i R{-^)uiVi- Since 
for any X G ?^„(C), \R{X)jk\ ^ {3mz)~^, one has, for some new constant c > and for all 
i e [0,1]: 



Y Y 



Etri?^(i) — -Etri^^(^)- 



^ cn ■ 



Plugging this last upper bound in ([70]) concludes the proof ([69]) and of Theorem lA.li 



Appendix B. 

In this section we collect some standard facts that are repeatedly used in the main text. For 
probability measures £ 'P(M), the Kolmogorov-Smirnov (KS) distance is defined by 



dKsi^J■,^J'') = sup|/u(-oo,t] - /i'(-oo,t]|. 



(71) 
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The KS distance is closely related to functions with bounded variations. More precisely, for 
/ : M I— )• R the bounded variation norm is defined as 



BV = sup^ \f{xk+i) - f{xk)\, 



where the supremum is over all sequence {xk)kez with x„ ^ Xn+i- If / = 1((— oo,t)) then 
11/11^^ = 1 while if the derivative of / is in ^^(M), we have WfWsv = f\f'ix)\dx. The KS 
distance is also given by the variational formula 

= sup{ j j fdl^' ■ WfWBV ^ l}- (72) 

For p ^ 1 and fi, fi' G V{M) such that J \x\Pd^{x) and J \x\Pdfi' (x) are finite, their L^- 
Wasserstein distance is defined as 

r 1 
Wpif,, fi') = ( inf / \x- y\PdTT{x, y)) ' (73) 

where the infimum is over all coupling vr of /i and /i' (i.e. vr is probability measure on M x M whose 
first marginal is equal to /i and second marginal is equal to ;u'). Holder's inequality implies that 
for 1 ^p^p', Wp ^ Wp/. 

For any p ^ 1, if Typ(^„,^) converges to then [in [i- This follows for example from the 
Kantorovich-Rubinstein duality 

H^i(M,/i') = sup|y fdii- I fdfi' : ll/llLip ^ l|, (74) 

where ||/||Lip denotes the Lipschitz constant of /. 

The following inequality is a standard consequence of interlacing, see e.g. [U Theorem A. 43]. 
Lemma B.l (Rank inequality). If A, B in ^„(C), then, 

dKsifJ-A, I^b) ^ - rank(A - B). 
n 

Next, we recall a very useful estimate which allows one to bound eigenvalue differences in 
terms of matrix entries. For a proof see e.g. P, Lemma 2.1.19]. 

Lemma B.2 (Hoffman- Wielandt inequality). If A, B in'Hn{C), then 



W2{fiA,f^B) ^ \/^tr[(A-B)2] 
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